A contrastive review of paraphrase acquisition techniques

نویسندگان

  • Houda Bouamor
  • Aurélien Max
  • Gabriel Illouz
  • Anne Vilnat
چکیده

This paper addresses the issue of what approach should be used for building a corpus of sentential paraphrases depending on one’s requirements. Six strategies are studied: (1) multiple translations into a single language from another language; (2) multiple translations into a single language from different other languages; (3) multiple descriptions of short videos; (4) multiple subtitles for the same language; (5) headlines for similar news articles; and (6) sub-sentential paraphrasing in the context of a Web-based game. We report results on French for 50 paraphrase pairs collected for all these strategies, where corpora were manually aligned at the finest possible level to define oracle performance in terms of accessible sub-sentential paraphrases. The differences observed will be used as criteria for motivating the choice of a given approach before attempting to build a new paraphrase corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Validation of sub-sentential paraphrases acquired from parallel monolingual corpora

The task of paraphrase acquisition from related sentences can be tackled by a variety of techniques making use of various types of knowledge. In this work, we make the hypothesis that their performance can be increased if candidate paraphrases can be validated using information that characterizes paraphrases independently of the set of techniques that proposed them. We implement this as a bi-cl...

متن کامل

Une étude en 3D de la paraphrase: types de corpus, langues et techniques (A Study of Paraphrase along 3 Dimensions : Corpus Types, Languages and Techniques) [in French]

A study of paraphrase along 3 dimensions : corpus types, languages and techniques In this paper, we report a detailed study of the impact of corpus type on the task of sub-sentential paraphrase acquisition. Our experiments are for 2 languages and 4 corpus types, and involve an efficient machine learning-based combination of 4 paraphrase acquisition systems. We obtain relative improvements of mo...

متن کامل

MIPA: Mutual Information Based Paraphrase Acquisition via Bilingual Pivoting

We present a pointwise mutual information (PMI) based approach for formalizing paraphrasability and propose a variant of PMI, called mutual information based paraphrase acquisition (MIPA), for paraphrase acquisition. Our paraphrase acquisition method first acquires lexical paraphrase pairs by bilingual pivoting and then reranks them by PMI and distributional similarity. The complementary nature...

متن کامل

Leveraging Crowdsourcing for Paraphrase Recognition

Crowdsourcing, while ideally reducing both costs and the need for domain experts, is no all-purpose tool. We review how paraphrase recognition has benefited from crowdsourcing in the past and identify two problems in paraphrase acquisition and semantic similarity evaluation that can be solved by employing a smart crowdsourcing strategy. First, we employ the CrowdFlower platform to conduct an ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012